Skip to main content

AI Assistant Configuration

Overview

Qarbine can use one of several Generative AI services to answer questions through “completions/inferences” and obtain embedding vectors. The latter can be used to locate similar content based on comparing their vectors in various vector savvy databases. Multiple “AI Assistants” can be configured within the same or across many Gen AI services.

ChatGPT, from the OpenAI company, is a natural language processing (NLP) tool driven by artificial intelligence that allows human-like conversations with a chatbot. The "GPT" stands for generative pre-trained transformer which is a type of large language model. The interface can answer questions and assist with tasks such as composing database queries. or summarizing text.

Several Qarbine GenAI integrated services follow the Open AI API interface pattern. These include:

  1. Open AI,
  1. Microsoft Azure Open AI,
  1. Mistral,
  1. Fireworks AI,
  1. Jina AI, and
  1. Perplexity (only completions).

A similar set of functionality are available from these additional supported services:

  1. Anthropic (only completions),
  1. AWS Bedrock,
  1. Cohere,
  1. Google AI, and
  1. Hugging Face.

The services which support multi-modal image embeddings include:

  • AWS Bedrock and
  • Hugging Face.

There are databases such as MongoDB, Neo4j, Couchbase, Milvus, and Pinecone which provide vector index searching that utilize embeddings. The configuration of each service is discussed in more detail below.

If the AI Assistant plug-in is installed but there is no corresponding configuration information set by the Qarbine administrator then the following log entry is created in ˜./pm2/logs/main-out.log

AI Assistant- Requires System.servicesOnly setting of aiAssistant. There may be a syntax error.
AiAssistantFunctions postStartup AI Assistant. The plugin settings must be an array.
Please see the configuration documentation.
AI Assistant 1.0 AI Assistant. The plugin settings must be an array. Please see the configuration documentation.

Qarbine Administration Tool Interactions

Qarbine supports multiple AI endpoints including Open AI, Azure AI, Google AI, and AWS Bedrock. To configure Qarbine access to your AI endpoints, open the Qarbine Administration tool.

Click on the Settings tab.

  

Expand the row by clicking the highlighted arrow.

  

Right click and choose

  

Scroll down to the new entry.

The general form of this setting is

aiAssistant = [
{ entry 1 },
{ entry n }
]

There is only one setting for the “aiAssistant” variable. You define as many AI endpoints as appropriate and then refer to them by their alias. Qarbine supports multiple simultaneous AI assistants. Users are presented with the configured alias names for selection. A single Gen AI endpoint service can have multiple entries (with different aliases!) and use varying models and other settings across them.

The result of the expression MUST be an array. The entry structures vary by type of the endpoint as described in more detail below.

"alias" : "userFacingName",
"type" : "string",See below for possible values.
"active" : true | false,The default is true.
"isDefault" : true | falseThe default is false.
"model1" : "string",
"model2" : "string",
"temperature" : number,
"topK" : number,
"topP" : number,
. . .

The recognized type values are "AzureAI", "OpenAI", "AWS_Bedrock", “GoogleAI”, and “GeneralAI”. The fields with “model1” refer to completion (completion/inference) oriented parameters while fields with “model2” refer to embedding parameters. The ‘model3’ parameters refer to image embeddings. For completions (inferences) the following parameters are commonly available.Refer to each specific service for details.

Parameter Comments
temperatureAdjusts the “sharpness” of the probability distribution. Higher temperature (greater than 1) results in more randomness, lower temperature (value closer to 0) results in more deterministic outputs. The default is 20.
topKRestricts the model to pick from k most likely words, adding diversity without extreme randomness. The value of k may need to be adjusted in different contexts. Common values are in the range of 40-100. The default is 1.
topPRestricts the model to pick the next word from a subset of words where the cumulative probability is just below p. This completion may add diversity than top-k because the number of words considered can vary. P is a float between 0-1, and in practice it is typically between 0.7-0.95. The default is 0.7.
maxGenerated TokenCountRestricts the response to the given number of tokens. For AWS Bedrock this is the maxTokenCount parameter.
otherOptions1JSON of anything else to add to the payload for an completion endpoint. The default endpoint arguments are the messages and model. Common option field names are request_timeout, timeout, max_tokens, temperature, frequency_penalty and presence_penalty.

The common embedding options are shown below.

baseURI2The default for baseURI2 is the baseURI1 value.
otherOptions2The JSON for anything else to add to the payload for an embedding endpoint. The default arguments are input and model.

Important Considerations

Editing

The standard setting listing is somewhat small for this setting so click inside the entry field and then press control-E to open up a larger editor.

  

Alias Naming

The alias value must not have any spaces. Use camelcase style (helloWorld) for multiple word aliases.

Data Structure Sharing for Query Definition Interactions

For the Data Access AI Assistant ONLY your data structure information is sent to the endpoint which is fundamentally needed when it is being asked to format a query for data or explain an existing query. None of your underlying data itself is sent to the endpoint as part of the prompting context. Users must still be cautious about their free form input which is sent to the endpoint as part of the interaction lifecycle.

Broader Information Sharing

The display of query results and template generated report results provide options to interact with an AI Assistant. The general dialog is shown below.

  

For query results the toolbar menu option is highlighted below.

  

The text options are shown below.

  

For template results the toolbar menu option is highlighted below.

  

The text options are shown below.

  

The template and query results may have content that your company policy does not want to be sent to a 3rd party AI service. The setting below controls access to these menu options.

allowResultInteractionsWithAiAssistant = false

It is defined within the Administration Tool’s Settings tab. Define the value in the section shown below.

  

The default is false. A sample entry enabling it is shown below.

  

End user interactions to Qarbine endpoints are logged for Qarbine Administrator review. This includes the AI Assistant service functionality. End user browsers are only presented with the AI Assistant aliases, types, and their default status.

Embedding Cache Parameters

The embedding values for a given AI Assistant are generally constants for a given model. These values which are arrays of floating point numbers can be optionally cached by Qarbine to avoid consumption of your endpoint tokens. The syntax for these settings is shown below.

aiAssistant = [
{option: 'cacheEmbeddings', value: true},
{option: 'maxCacheTextEmbeddingLength' , value: 90},
{ entry 1
{option: 'cacheEmbeddings', value: true},
},
{ entry n }
]

There is a global cacheEmbeddings setting and optionally a per entry cacheEmbeddings setting. The cache is within the Qarbine host’s folder ˜/qarbine.service/aiAssistant.

Open AI Configuration

Overview

The Qarbine Administrator must set the Open AI API key in the Qarbine Administration tool.

All interactions with OpenAI use their built-in moderation filters by prefixing prompts with “content-filter-alpha”. For more information see the details at https://platform.openai.com/docs/guides/moderation. There are considerations such as rate limits which can be reviewed at https://platform.openai.com/docs/guides/rate-limits. To view an account’s usage status visit the OpenAI website at https://openai.com/ and log in to your account using the credentials you used during the registration process. Once you are logged in, navigate to your account dashboard. OpenAI tokens are consumed by the API key’s organization.
See https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
and https://learn.microsoft.com/en-us/azure/ai-services/openai/overview#tokens.

Open.ai Console Interactions

See the Open AI website for details on obtaining an API key. A starting page is at https://openai.com/blog/openai-api.

Qarbine Administration Tool Interactions

To configure Qarbine access to the Open AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the entry as shown below.

"type": "OpenAI",
"alias" : "myOpenAi",
"isDefault" : true,
"baseURI1" : "YOUR_BASE_URI",
"apiKey" : "YOUR_API_KEY",
"model1" : "YOUR_INFERENCE_MODEL",
"model2" : "YOUR_EMBEDDING_MODEL",
"path1" : "SUBPATH_COMPLETION",
"path2" : "SUBPATH_EMBEDDING"

The default baseURI1 (you, are, eye, one) value is https://api.openai.com.

The model1 value for inferences (completions) is optional, The default is gpt-3.5-turbo.
See https://platform.openai.com/docs/api-reference/making-requests. Specify ‘None’ to not pass it as a payload argument with the input. This can be the case when path1 has the model within it.

The model2 value for embeddings is optional, The default is text-embedding-ada-002.
See https://platform.openai.com/docs/guides/embeddings. Specify ‘None’ to not pass it as a payload argument with the input. This can be the case when path2 has the model within it.

If the model is part of the path (path1 or path2) then use 'None' as the specification value.

The optional path1 is appended to the baseURI1 value to form the final HTTP endpoint. The optional path2 is appended to the baseURI1 value to form the final HTTP endpoint. These can be left out for standard Open AI interactions.

There are 2 endpoint options that are primarily used for Open AI compatible services as described below. The default endpoint1 for completions is /v1/chat/completions. The default endpoint2 for embeddings is /v1/embeddings.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Open AI Compatible Services

Overview

Some 3rd parties offer Open AI API compatible services. These use a different base URI than the default Open AI one. When using such services be sure to set the baseURI1, model1 and model2 values appropriately for the 3rd party compatible service. A model1 value of 'default' passes no model at all to the completions service.

The apiKey and baseURI1 values are used for completions and the apiKey2 and baseURI2 values are used for embeddings. The default baseURI2 is whatever the value is for baseURI1.

A portion of a Mistral AI example specification is shown below.

{
"type" : "GeneralAI",
"alias" : "myMistral",
"isDefault" : true,
"apiKey" : "YOUR_API_KEY***",
"model1" : "pixtral",
"baseURI1" : "https://api.mistral.ai",
}

For clarity, the last entry is “base you are eye one”. Verify the use of commas separating the values and there is no trailing extra comma from a JSON perspective.

Fireworks AI

General information can be found at https://fireworks.ai/. The API Key can be found in your cloud console at https://fireworks.ai/api-keys. Sample parameters are shown below.

"baseURI1" : "https://api.fireworks.ai/inference",
"alias" : "myFireworks",
"apiKey" : "YOUR_API_KEY",
"model1" : "accounts/fireworks/models/llama-v2-7b-chat",
"model2" : "intfloat/e5-mistral-7b-instruct"

A listing of models can be found at https://fireworks.ai/models.

Jina AI

General information can be found at https://jina.ai/. Jina.ai has different keys and base URI values for completions vs. embeddings so the apiKey2 parameter is used. Sample parameters are shown below.

"alias" : "myJina",
"apiKey" : "YOUR_API_KEY",
"apiKey2" : "YOUR_EMBEDDING_API_KEY",
"baseURI1" : "https://api.chat.jina.ai",
"baseURI2" : "https://api.jina.ai"

A tutorial on using Jina.ai can be found at https://www.mongodb.com/developer/products/atlas/jina-ai-semantic-search/

Mistral AI

General information can be found at https://mistral.ai/. Information on the Mistral API can be found at https://docs.mistral.ai/api/.

Use the following endpoint values:

"alias" : "myMistral",
"baseURI1" : "https://api.mistral.ai",

Information on the supported models can be found at https://docs.mistral.ai/models.
For the completion interactions additional otherOptions1 values to consider are safe_prompt and random_seed. Do not specify “stream: true”!

Nomic AI

General information can be found at https://nomic.ai. Nomic only supports embeddings at this time. The required Nomic API key can be accessed by signing on to your Nomic dashboard at https://atlas.nomic.ai/ and then clicking the “API KEYS” tab. For documentation see https://docs.nomic.ai/index.html. Use the following setting values:

"type" : "GeneralAI",
"alias" : "myNomic",
"baseURI1" : "None",
"baseURI2" : "https://api-atlas.nomic.ai",
"path2" : "/v1/embedding/text",
"path3" : "/v1/embedding/image",
"apiKey" : "YOUR_NOMIC_KEY",
"model1" : "None",No completion support.
"model2" : "nomic-embed-text-v1",
"model3" : "nomic-embed-vision-v1.5",
"embedWhatField" : "texts"

The embedWhatField is required. If is it missing then Nomic raises an error “error 422 Unprocessable Entity.Error”.

Perplexity AI

General information can be found at https://perplexity.ai. Perplexity only supports completions at this time. For information on available models see https://docs.perplexity.ai/docs/model-cards. A discussion of them is at https://blog.perplexity.ai/blog/introducing-pplx-online-llms. Use the following basic setting values:

"type" : "GeneralAI",
"alias" : "myPerplexity",
"baseURI1" : "https://api.perplexity.ai",
"endpoint1" : "/chat/completions",Different than the Open AI default.

A sample completion model value is

"model1" : "llama-2-70b-chat",

Since Perplexity does not support embeddings, set this parameter,

"baseURI2" : "None",

Some otherOptions1 parameters to consider are presence_penalty and frequency_penalty. For API details see https://docs.perplexity.ai/reference/post_chat_completions.

Anthropic AI

Overview

Anthropic only provides completion support. See https://www.anthropic.com/ for more geneal information and the following for embeddings information https://docs.anthropic.com/claude/docs/embeddings.

Anthropic Configuration

Sign into your Anthropic account.

Navigate to the following page to obtain an API key.

Click

  

Enter a name for the key.

  

Click

  

A dialog appears.

  

Click

  

Click

  

Qarbine Administration Tool Interactions

To configure Qarbine access to the Anthropic AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

"type": "Anthropic",
"alias" : "myAnthropic",
"apiKey" : "YOUR API KEY",
"model1" : "YOUR COMPLETION MODEL",

The default for model1 is “claude-2.1” and the default maxTokens is 1024.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Azure Cognitive Services Configuration

Overview

For more information see https://azure.microsoft.com/en-us/products/ai-services/openai-service and https://learn.microsoft.com/en-us/azure/ai-services/openai/quickstart.

You need several parameters to interact with the Azure Open AI services,

Parameter Comments
EndpointThis value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. Alternatively, you can find the value in the Azure AI Studio > Playground > Code View. An example endpoint is: https://docs-test-001.openai.azure.com/ where “docs-test-001” is the name of the Azure resource.
API keyThis value can be found in the Keys & Endpoint section when examining your resource from the Azure portal. You can use either KEY1 or KEY2.
Deployment IDThis is the name of your deployed model.

Azure Portal Interactions

Endpoint and API Key Determination

Sign on to portal.azure.com. On the home page search for “openai” as shown below.

  

Click the gray area to open the Azure AI services page.

  

If you have no OpenAI deployments then you will be presented with that status.

Creating an OpenAI Deployment

If there are none then the first time through click the button shown below.

  

Fill in the fields.

  

  

Click

  

Choose your network configuration.

  

Click

  

Set any tags.

  

Click

  

Review the settings.

Click

  

Wait

  

Soon you will see

  

Click

  

Accessing the API Keys

On the right hand side of the page note the endpoint URL.

  

Click the highlighted link

  

This opens the page with the heading similar to the one below.

  

The resource name in this example is “azureaihelper”.

Note the fields below.

  

Copy one of the KEY values by clicking on   .

Store it somewhere temporarily.
Copy the endpoint by clicking on   . For example,

https://azureaihelper.openai.azure.com/

Also store it somewhere temporarily. You will need your endpoint and access key for authenticating your Azure Open AI requests by Qarbine.

First Model Deployment Creation

A model is required by the OpenAI service. In the left hand gutter area click

  

Note the message
  
Click

  

If you have none then the first time through you will see

  

Click

  

Fill in the fields

  

Note the deployment name. It is needed for Azure Open AI interactions by Qarbine.

Click

  

You should then see

  

The page updates to show the deployment.

  

A separate deployment is required for the Qarbine embedding support. The Qarbine macro function embeddings(string) returns the vectors for the given string. These vectors can subsequently be used in a vector aware query,

Based on the steps above, create another deployment using the text-embedding-ada-002 model.

Shown below are 2 sample deployments in the azureaihelper named resource.

  

Qarbine Administration Tool Interactions

To configure Qarbine access to the AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

"type": "AzureAI",
"alias" : "myAzure",
"isDefault" : true,
"apiKey" : "YOUR API KEY",
"resource1" : "YOUR COMPLETION RESOURCE",
"deployment1" : "YOUR COMPLETION DEPLOYMENT",
"resource2" : "YOUR EMBEDDING RESOURCE",
"deployment2" : "YOUR EMBEDDING DEPLOYMENT"

In our example the completion and embedding resource names are the same.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

AWS Bedrock Configuration

Overview

To read about embeddings in AWS Bedrock see
https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html

The completion model used is anthropic.claude-v2. Its parameters include a maximum of 5000 tokens, temperature of 0.5, and top_k of 250.

The embedding model used is “amazon.titan-embed-text-v1”. For embedding configuration see
https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html.

Accessing the Model

For the completion service you must have access to the AWS Claude Anthropic model. Navigate to

Click

Getting Started

At the prompt

  

Click

Manage model access

Click

  

Click

  

Fill in your details in

  

Apply LLMs to the broad use case of helping to query AWS NoSQL databases and help author AWS NoSQL database queries. We also integrate with legacy SQL databases. Qarbine amplifies customer ROI for AWS NoSQL databases by providing a suite of analysis and reporting tools which are modern data savvy.

Click

  

Review

  

Click

  

Click

  

Wait

  

Click

  

If necessary, wait for the model to be available. Once the model is available in Bedrock, you can use it to play around in the AWS console with Chat, Text and Image sections, depending on the requested model. You can also configure Qarbine’s interaction.

For embedding, you must have access to the amazon.titan-embed-text-v1 model.
See https://docs.aws.amazon.com/bedrock/latest/userguide/embeddings.html.

Defining AWS IAM Credentials

Use AWS Identity and Access Management (IAM) to create a user with Bedrock access. The associated sample policy details are shown below.

  

Next, obtain the AWS access key ID and secret access key credentials to configure Qarbine to interact with Bedrock.

Qarbine Administration Tool Interactions

To configure Qarbine access to the Azure Open AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

"type": "AWS_Bedrock",     
"alias" : "myBedrock",
"isDefault" : true,
"accessKeyId" : "AKIA******",
"secretAccessKey" : "8QRXh2*****",
"region" : "us-east-1",
"model1": “xxx”,The completion default is "anthropic.claude-v2".
"maxTokensToSample" : number,
"model2": “yyy”,The embedding default is "amazon.titan-embed-text-v1".
"model3": “yyy”,Image embedding default is “amazon.titan-embed-image-v1".

For information on model parameters see:

To save this click

  

The AWS credentials are only visible within the Qarbine compute nodes. They are not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Cohere

Overview

For general information on Cohere see https://cohere.com/.

Cohere Portal Interactions

The Cohere API keys can be found within your account at

Qarbine Administration Tool Interactions

To configure Qarbine access to the endpoints, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the basic template shown below.

aiAssistant = [
{
"type": "Cohere",
"alias" : "myCohere",
"apiKey" : "YOUR API TOKEN"
}
]

The default baseURI1 is "https://api.cohere.ai". Note there is no trailing slash.

The default for the completions model1 value is "/v1/generate". The default for the embedding model2 value is "/v1/embed". Note the leading slash.

The optional model1 is appended to the baseURI1 value to form the final HTTP endpoint. The optional model2 is appended to the baseURI1 value to form the final HTTP endpoint.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Google AI (Gemini) Configuration

Overview

Google AI provides both completion and embedding services. The endpoints integrated into Qarbine use Gemini which is Google's largest and most capable AI model. For details see https://deepmind.google/technologies/gemini/#introduction.

Google Portal Interactions

The Google AI API keys can be obtained from Google AI Studio at the following URL,
https://makersuite.google.com/app/apikey. Details on the API can be found at https://ai.google.dev/docs.

Qarbine Administration Tool Interactions

To configure Qarbine access to the AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

"type": "GoogleAI",
"alias" : "myGoogle",
"apiKey" : "YOUR API KEY",
"model1" : "YOUR COMPLETION MODEL",
"model2" : "YOUR EMBEDDINGS MODEL"]

The default baseURI1 value is "https://generativelanguage.googleapis.com/v1beta".

The list of Gemini models can be found at https://ai.google.dev/models/gemini. The default model1 value is "models/gemini-pro”. The default model2 value is “models/embedding-001”.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. This can be done from the Qarbine Administrator tool. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Gradient Configuration

Overview

Gradient provides both general purpose and industry specific large language models. In addition, it can manage private models. For more information see https://gradient.ai/.

Gradient Portal Interactions

The Gradient access token and workspace identifiers required by the Qarbine configuration can be found on the page at https://auth.gradient.ai/select-workspace

Qarbine Administration Tool Interactions

To configure Qarbine access to the AI endpoint, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

aiAssistant = [
{
"type": "Gradient",
"alias" : "myGradient",
"isDefault" : true,
"apiKey" : "YOUR ACCESS TOKEN",
"workspace" : "YOUR WORKSPACE_ID",
"model1" : "YOUR COMPLETION MODEL",
"model1Format" : "MODEL_FORMAT",
"model2" : "YOUR SLUG ID",
}
]

The value for model1 is the model identifier from the page at https://docs.gradient.ai/docs/models-1. Below are sample values from that page. The default is for the Llama-2 7B language model.

Language Model Model ID and Slug ID
Bloom-56099148c6d-c2a0-4fbe-a4a7-e7c05bdb8a09_base_ml_modelbloom-560m
Llama-2 7Bf0b97d96-51a8-4040-8b22-7940ee1fa24e_base_ml_model Llama-2 13B
Nous Hermes 2cc2dafce-9e6e-4a23-a918-cad6ba89e42e_base_ml_modelnous-hermes2

The model1Format value is needed to properly format the request to match the appropriate base model used for fine-tuning. The following slightly generic values may be used: “Llama” and “Nous Hermes”. The default is ‘Llama’. For more information see https://docs.gradient.ai/docs/cli-quickstart#-generating-completions-from-your-model

The model2 value is used for embeddings. Its default value is “bge-large”.

In our example the completion and embedding resource names are the same.

To save this click

  

The API key is only visible within the Qarbine compute nodes. It is not visible in client browsers. The setting does not take effect until the next restart of the compute node with the Qarbine AI Assistant plugin. On the main Qarbine compute node, the plugins to load are listed in the file ˜/qarbine.service/config/service.NAME.json.

Hugging Face

Overview

For general information on Hugging Face see https://huggingface.co/.

Hugging Face Portal Interactions

The Hugging Face access tokens can be found within your account at https://huggingface.co/settings/profile.

Qarbine Administration Tool Interactions

To configure Qarbine access to the Hugging Face endpoints, open the Qarbine Administration tool as described at the top of this document. Create or edit the aiAssistant setting.

Check the Read only box and fill in the value template shown below.

"type": "HuggingFace",
"alias" : "myHuggingFace",
"apiKey" : "YOUR ACCESS TOKEN",
"baseURI1" : "aaa",
"baseURI2" : "...",
"model1" : "the completion model",
"model2" : "the text embedding model",
"model3" : "the image embedding model",

The default baseURI1 is "https://api-inference.huggingface.co". Note there is no trailing slash.

The default for baseURI2 is the baseURI1 value.

The default for model1 is "/models/mistralai/Mistral-7B-v0.1". The default for model2 is "/pipeline/feature-extraction/sentence-transformers/all-MiniLM-L6-v2". The default model3 is “/models/google/vit-base-patch16-384”. Note the leading slashes. The base URI values and these model values form the service call URL.

The default "otherOptions2" value is {wait_for_model: true}. This is to reduce problems with an error such as {"error":"Model sentence-transformers/all-MiniLM-L6-v2 is currently loading","estimated_time":20.0}.

Observed Caveats

Depending on your subscription, you may receive

Model is too large to load onto the free Inference API. To try the model, launch it on
Inference Endpoints instead. See https://ui.endpoints.huggingface.co/welcome,

This is the case for a model such as gradientai/v-alpha-tross. Albatross is a collection of domain-specific language models for finance applications developed by Gradient. For blogs on this model see https://gradient.ai/blog/albatross-responsible-ai and
https://gradient.ai/blog/alphatross-llm-now-available-on-hugging-face.

With the completion URL "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-v0.1" asking "What is the capital of France" returned

[{'generated_text': 'What is the capital of France?\n\nParis is the capital of France.\n\n
What is the capital of the United States?\n\nWashington, D.C. is the capital of the United
States.\n\nWhat is the capital of the United Kingdom?\n\nLondon is the capital of the
United Kingdom.\n\nWhat is the capital of Canada?\n\nOttawa is the capital of
Canada.\n\nWhat is the capital of Australia?\n\nCanberra is the capital of Australia'}]

Lower casing the country name and asking "What is the capital of france" returned

[{'generated_text': 'What is the capital of france?\n\nParis is the capital of France. It is
located in the north of the country, on the river Seine. It is the largest city in France, with
a population of over 2 million people. Paris is known for its beautiful architecture,
including the Eiffel Tower, the Louvre Museum, and the Arc de Triomphe. It is also a
major center for fashion, art, and culture.\n\nWhat is the capital of france in french?\n'}]